R.version
## _
## platform x86_64-apple-darwin17.0
## arch x86_64
## os darwin17.0
## system x86_64, darwin17.0
## status
## major 4
## minor 0.4
## year 2021
## month 02
## day 15
## svn rev 80002
## language R
## version.string R version 4.0.4 (2021-02-15)
## nickname Lost Library Book
packageVersion("DESeq2") #1.30.1
packageVersion("ggplot2") #3.3.3
packageVersion("affycoretools") #1.62.0
packageVersion("arrayQualityMetrics") #3.46.0
packageVersion("genefilter") #1.72.1
packageVersion("Biobase") #2.50.0
packageVersion("dplyr") #1.0.5
packageVersion("pheatmap") #1.0.12
packageVersion("vegan") #2.5.7
packageVersion("ggrepel") #0.9.1
packageVersion("tidyverse") #1.3.0
For our DESeq analysis, we used gene expression data originally published with the study "Gene expression of endangered coral (Orbicella spp.) in Flower Garden Banks National Marine Sanctuary after Hurricane Harvey, published in Frontiers in Marine Science in 2019 by R.M. Wright, A.M.S. Correa, L.A. Quigley, L.Z. Santiago-Vazquez, K.E.F. Shamberger and S.W. Davies.
In late August, 2017, Hurricane Harvey– a category 4 hurricane– traveled over the Gulf of Mexico, and made landfall in coastal Texas, where the storm dumped 33 trillion gallons of freshwater (van Oldenborgh et al., 2017). Harvey caused significant damage on land, and heavy freshwater input into the Gulf of Mexico off of the Texas coast.
satellite animation of Hurricane Harvey over the Texas Coast in August, 2017
Flower Garden Banks (FGB) National Marine Sanctuary is located 190kn off the coast of the Texas-Louisiana border, and is one of the few global reef systems that has retained the majority of its coral cover without loss to disease and bleaching. However, with the influx of freshwater from Harvey, it was hypothesized that decreased salinity would lead to a die-off event.
In the case of FGB after Harvey, the freshwater influx led to a period of sub-lethal stress rather than mortality. This allowed Wright et. al. to investigate differential gene expression in periods of sub-lethal stress and subsequent recovery.
For this study, Wright et. al. sampled two species of Orbicella, Orbicella faveolata and Orbicella franksi from the East and West FBG in September and October 2017. Using salinity measurements from the Texas Automated Buoy System taken from nearby FGB, samples collected from September were categorized as being under “sub-lethal stress,” and samples collected from October were categorized as being in “recovery.” A total of 47 tissue samples from the same set of tagged Orbicella colonies were collected under both conditions. After collection, RNA was isolated, amplified, and barcoded and sequenced. Differential gene expression was identified using the DESeq2 in R.
Orbicella faveolata, one of two Orbicella species researched by Wright et al. 2019
For our analysis, we used a subsection of the study’s data from the O. faveolata samples. We then ran the DESeq2 and GO enrichment pipelines. A step-by-step version of our analyses can be found below.
set the working directory
setwd("/Users/elsa/Desktop/BI_586/Assign2_Coral_FW")
BiocManager::install("DESeq2")
install.packages("backports")
install.packages("caTools")
BiocManager::install("affycoretools")
BiocManager::install("arrayQualityMetrics")
install.packages("pheatmap")
install.packages("tidyverse")
Load the required packages.
library(DESeq2)
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
library(affycoretools)
##
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
library(arrayQualityMetrics)
library(genefilter)
##
## Attaching package: 'genefilter'
## The following objects are masked from 'package:MatrixGenerics':
##
## rowSds, rowVars
## The following objects are masked from 'package:matrixStats':
##
## rowSds, rowVars
library(Biobase)
allcounts_Harvey_fav.txt file is from BI586 GitHub. To make a count data .txt, follow the taqseq processing protocol.
countData <- read.table("allcounts_Harvey_fav.txt")
head(countData)
## fav_recoveryA fav_recoveryB fav_recoveryC fav_stressA
## isogroup0 70 5 59 26
## isogroup100027 0 0 0 0
## isogroup100029 0 0 0 1
## isogroup100033 0 0 0 0
## isogroup10004 5 0 0 3
## isogroup100049 4 2 6 10
## fav_stressB fav_stressC
## isogroup0 9 6
## isogroup100027 1 0
## isogroup100029 0 0
## isogroup100033 0 0
## isogroup10004 0 2
## isogroup100049 1 0
length(countData[,1])
## [1] 19937
# set names for countData
names(countData)=c("fav_recoveryA", "fav_recoveryB", "fav_recoveryC","fav_stressA", "fav_stressB", "fav_stressC")
#row.names(countData)=sub("", "isogroup", rownames(countData)) We don't need to modify the row names
head(countData)
## fav_recoveryA fav_recoveryB fav_recoveryC fav_stressA
## isogroup0 70 5 59 26
## isogroup100027 0 0 0 0
## isogroup100029 0 0 0 1
## isogroup100033 0 0 0 0
## isogroup10004 5 0 0 3
## isogroup100049 4 2 6 10
## fav_stressB fav_stressC
## isogroup0 9 6
## isogroup100027 1 0
## isogroup100029 0 0
## isogroup100033 0 0
## isogroup10004 0 2
## isogroup100049 1 0
Set the directory.
setwd("/Users/elsa/Desktop/BI_586/Assign2_Coral_FW")
v=setwd("/Users/elsa/Desktop/BI_586/Assign2_Coral_FW")
Create a model using DESeq and normalizing the data.
treat=c("fav_recovery", "fav_recovery", "fav_recovery","fav_stress", "fav_stress", "fav_stress")
replicate = c("A", "B", "C", "A", "B", "C")
g = data.frame(cbind(treat, replicate))
g
## treat replicate
## 1 fav_recovery A
## 2 fav_recovery B
## 3 fav_recovery C
## 4 fav_stress A
## 5 fav_stress B
## 6 fav_stress C
colData= g
# create a model using DESeq too see how our design is varied by treatment
dds=DESeqDataSetFromMatrix(countData=countData,
colData = g,
design = ~treat)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
# normalizing the data
vsd.ge=assay(vst(dds))
rl=vst(dds)
e=ExpressionSet(assay(rl), AnnotatedDataFrame(as.data.frame(colData(rl))))
arrayQualityMetrics(e,outdir=v,intgroup=c("treat"),force=T)
## The report will be written into directory '/Users/elsa/Desktop/BI_586/Assign2_Coral_FW'.
## Warning in svgStyleAttributes(style, svgdev): Removing non-SVG style attribute
## name(s): subscripts, group.number, group.value
## Warning in svgStyleAttributes(style, svgdev): Removing non-SVG style attribute
## name(s): subscripts, group.number, group.value
## Warning in svgStyleAttributes(style, svgdev): Removing non-SVG style attribute
## name(s): subscripts, group.number, group.value
## Warning in svgStyleAttributes(style, svgdev): Removing non-SVG style attribute
## name(s): subscripts, group.number, group.value
## Warning in svgStyleAttributes(style, svgdev): Removing non-SVG style attribute
## name(s): subscripts, group.number, group.value
## Warning in svgStyleAttributes(style, svgdev): Removing non-SVG style attribute
## name(s): subscripts, group.number, group.value
Now we can go over to our Outliers directory and take a look at the index.html file. We didn’t detect any outliers, but if there was an outlier you would have to remove them.
Since we don’t have to run through the outlier chunk of code every time, let’s read in counts data again.
countData <- read.table("allcounts_Harvey_fav.txt")
head(countData)
## fav_recoveryA fav_recoveryB fav_recoveryC fav_stressA
## isogroup0 70 5 59 26
## isogroup100027 0 0 0 0
## isogroup100029 0 0 0 1
## isogroup100033 0 0 0 0
## isogroup10004 5 0 0 3
## isogroup100049 4 2 6 10
## fav_stressB fav_stressC
## isogroup0 9 6
## isogroup100027 1 0
## isogroup100029 0 0
## isogroup100033 0 0
## isogroup10004 0 2
## isogroup100049 1 0
length(countData[,1])
## [1] 19937
names(countData)=c("fav_recoveryA", "fav_recoveryB", "fav_recoveryC","fav_stressA", "fav_stressB", "fav_stressC")
head(countData)
## fav_recoveryA fav_recoveryB fav_recoveryC fav_stressA
## isogroup0 70 5 59 26
## isogroup100027 0 0 0 0
## isogroup100029 0 0 0 1
## isogroup100033 0 0 0 0
## isogroup10004 5 0 0 3
## isogroup100049 4 2 6 10
## fav_stressB fav_stressC
## isogroup0 9 6
## isogroup100027 1 0
## isogroup100029 0 0
## isogroup100033 0 0
## isogroup10004 0 2
## isogroup100049 1 0
Let’s generate a table and visualize the total counts.
totalCounts=colSums(countData)
totalCounts
## fav_recoveryA fav_recoveryB fav_recoveryC fav_stressA fav_stressB
## 6596244 1263479 1665705 970497 771267
## fav_stressC
## 458772
barplot(totalCounts, col=c("coral", "coral", "coral", "red", "red", "red"), ylab="raw counts", main = "Total Counts of Recovery vs. Stress")
Looks like the total counts vary a little bit. Let’s take a look at the counts in numbers.
min(totalCounts) #458772
## [1] 458772
max(totalCounts) #6596244
## [1] 6596244
Create a model using DESeq and normalizing the data.
treat=c("fav_recovery", "fav_recovery", "fav_recovery","fav_stress", "fav_stress", "fav_stress")
replicate = c("A", "B", "C", "A", "B", "C")
g = data.frame(cbind(treat, replicate))
g
## treat replicate
## 1 fav_recovery A
## 2 fav_recovery B
## 3 fav_recovery C
## 4 fav_stress A
## 5 fav_stress B
## 6 fav_stress C
colData<- g
dds<-DESeqDataSetFromMatrix(countData=countData, colData=colData, design=~treat+replicate) #can only test for the main effects of site, pco2, temp
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
#one step DESeq
dds<-DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
Let’s take a look at our DESeq result.
head(dds)
## class: DESeqDataSet
## dim: 6 6
## metadata(1): version
## assays(4): counts mu H cooks
## rownames(6): isogroup0 isogroup100027 ... isogroup10004 isogroup100049
## rowData names(30): baseMean baseVar ... deviance maxCooks
## colnames(6): fav_recoveryA fav_recoveryB ... fav_stressB fav_stressC
## colData names(3): treat replicate sizeFactor
res<- results(dds)
Now let’s plot the dispersions, which is looking at what DESeq is doing to our data. They do so by fitting our data into this curve.
plotDispEsts(dds, main="Dispersion plot Corals")
Dispersion look like a hokey stick, which is good! ### Pairwise comparasion of stress vs. recovery The order matters. Make sure that you put treatment first and control second.
colData$treat<-factor(colData$treat, levels=c("fav_stress", "fav_recovery"))
resstress <- results(dds, contrast=c("treat","fav_stress", "fav_recovery"))
Let’s take a look at the number of genes that meet the different thresholds. padj (p-adjusted) takes in account of the comparasions, rather than just looking at one object like p-value. If we’re looking at a large amount of gene, it’s easy to get a false positive just by chance. We have to do multiple test corrections to get the padj value.
# number of differentially genes
#how many FDR < 10%?
table(resstress$padj<0.1)
##
## FALSE TRUE
## 4538 46
# 0.1=46
# 0.05=28
# 0.01=19
summary(resstress)
##
## out of 16976 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 31, 0.18%
## LFC < 0 (down) : 15, 0.088%
## outliers [1] : 0, 0%
## low counts [2] : 12392, 73%
## (mean count < 13)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
There are 12392 samples removed because of low counts. Only 15 genes are down regulated under stress relative to recovery, and 31 genes are up regulated with the padj < 0.2.
Let’s look at numbers of differential express genes that pass a threshold of 0.05. We find 28
nrow(resstress[resstress$padj<0.05 & !is.na(resstress$padj),])
## [1] 28
Now let’s make a MA plot and save the result as a dataframe.
plotMA(resstress, main="Stress vs Recovery")
plotMA(resstress, main="Stress vs Recovery", ylim=c(-6,6)) # look at the same plot with differet ylim
results <- as.data.frame(resstress)
head(results) # somehow isogoup is doubled. Stil not sure how to fix this.
## baseMean log2FoldChange lfcSE stat pvalue
## isogroup0 19.2036849 -0.1794010 0.9219963 -0.1945788 0.8457227
## isogroup100027 0.2643740 1.8990472 4.0518643 0.4686848 0.6392950
## isogroup100029 0.1334307 2.0344289 4.0789143 0.4987673 0.6179434
## isogroup100033 0.0000000 NA NA NA NA
## isogroup10004 1.3084957 1.9497428 3.2262365 0.6043397 0.5456178
## isogroup100049 3.1794868 -0.6266369 2.0522492 -0.3053415 0.7601060
## padj
## isogroup0 0.9953109
## isogroup100027 NA
## isogroup100029 NA
## isogroup100033 NA
## isogroup10004 NA
## isogroup100049 NA
Get the number of genes that are up/down regulated. We find 31 up and 15 down
nrow(resstress[resstress$padj<0.1 & resstress$log2FoldChange > 0 & !is.na(resstress$padj),])
## [1] 31
nrow(resstress[resstress$padj<0.1 & resstress$log2FoldChange < 0 & !is.na(resstress$padj),])
## [1] 15
Write a table of the results, and save to your directory as a .csv file
write.table(resstress, file="stress_Harvey.txt", quote=F, sep="\t")
cd <- read.table("stress_Harvey.txt")
head(cd)
## baseMean log2FoldChange lfcSE stat pvalue
## isogroup0 19.2036849 -0.1794010 0.9219963 -0.1945788 0.8457227
## isogroup100027 0.2643740 1.8990472 4.0518643 0.4686848 0.6392950
## isogroup100029 0.1334307 2.0344289 4.0789143 0.4987673 0.6179434
## isogroup100033 0.0000000 NA NA NA NA
## isogroup10004 1.3084957 1.9497428 3.2262365 0.6043397 0.5456178
## isogroup100049 3.1794868 -0.6266369 2.0522492 -0.3053415 0.7601060
## padj
## isogroup0 0.9953109
## isogroup100027 NA
## isogroup100029 NA
## isogroup100033 NA
## isogroup10004 NA
## isogroup100049 NA
The following chunk creates a .csv file of the GO table for MWU (later)
library(dplyr)
cd
go_input_stress = cd %>%
tibble::rownames_to_column(var = "iso") %>%
mutate(mutated_p = -log(pvalue)) %>%
mutate(mutated_p_updown = ifelse(log2FoldChange < 0, mutated_p*-1, mutated_p*1)) %>%
na.omit() %>%
select(iso, mutated_p_updown)
head(go_input_stress)
colnames(go_input_stress) <- c("gene", "pval")
head(go_input_stress)
write.csv(go_input_stress, file="stress_GO.csv", quote=F, row.names=FALSE)
This next chunk gets the p values and creates a table of the rlogdata and p values
valstress=cbind(resstress$pvalue, resstress$padj)
colnames(valstress)=c("pval.stress", "padj.stress")
length(valstress[,1])
## [1] 19937
table(complete.cases(valstress))
##
## FALSE TRUE
## 15353 4584
rlog=rlogTransformation(dds, blind=TRUE)
rld=assay(rlog)
colnames(rld)=paste(colData$treat)
length(rld[,1])
## [1] 19937
rldpvals=cbind(rld,valstress)
dim(rldpvals)
## [1] 19937 8
# [1] 19937 8
table(complete.cases(rldpvals))
##
## FALSE TRUE
## 15353 4584
#FALSE TRUE
#15353 4584
write.csv(rldpvals, "fav_Harvey_RLDandPVALS.csv", quote=F)
colnames(rld)=paste(colData$treat)
This chunk creates an improved sample distance heatmap of overall gene expression differences between the stress and recovery groups
rldpvals <- read.csv(file="fav_Harvey_RLDandPVALS.csv", row.names=1)
head(rldpvals)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## isogroup0 3.86719166 3.7491423 5.0074464 4.2935649
## isogroup100027 -1.85901635 -1.7377292 -1.8297627 -1.8207768
## isogroup100029 -1.79322273 -1.7293472 -1.7807991 -1.6844258
## isogroup100033 0.00000000 0.0000000 0.0000000 0.0000000
## isogroup10004 0.04380176 -0.1381963 -0.2723896 0.3921522
## isogroup100049 0.87721463 1.6645507 1.6893611 2.1251942
## fav_stress.1 fav_stress.2 pval.stress padj.stress
## isogroup0 3.9490624 3.9086622 0.8457227 0.9953109
## isogroup100027 -1.6015120 -1.7362945 0.6392950 NA
## isogroup100029 -1.7445438 -1.7285451 0.6179434 NA
## isogroup100033 0.0000000 0.0000000 NA NA
## isogroup10004 -0.2014918 0.6506082 0.5456178 NA
## isogroup100049 1.2066803 0.8932394 0.7601060 NA
rld=rldpvals[,1:6]
head(rld)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## isogroup0 3.86719166 3.7491423 5.0074464 4.2935649
## isogroup100027 -1.85901635 -1.7377292 -1.8297627 -1.8207768
## isogroup100029 -1.79322273 -1.7293472 -1.7807991 -1.6844258
## isogroup100033 0.00000000 0.0000000 0.0000000 0.0000000
## isogroup10004 0.04380176 -0.1381963 -0.2723896 0.3921522
## isogroup100049 0.87721463 1.6645507 1.6893611 2.1251942
## fav_stress.1 fav_stress.2
## isogroup0 3.9490624 3.9086622
## isogroup100027 -1.6015120 -1.7362945
## isogroup100029 -1.7445438 -1.7285451
## isogroup100033 0.0000000 0.0000000
## isogroup10004 -0.2014918 0.6506082
## isogroup100049 1.2066803 0.8932394
sampleDists <- dist(t(rld))
sampleDistMatrix <- as.matrix( sampleDists )
treat=c( "fav_recoveryA", "fav_recoveryB", "fav_recoveryC", "fav_stressA", "fav_stressB", "fav_stressC")
colnames(sampleDistMatrix)=paste(treat)
rownames(sampleDistMatrix)=paste(treat)
library("pheatmap")
heat.colors = colorRampPalette(rev(c("blue","yellow","red")),bias=0.3)(100)
pheatmap(sampleDistMatrix,color = heat.colors,cex=0.9,border_color=NA,cluster_rows=T,cluster_cols=T)
## Found more than one class "unit" in cache; using the first, from namespace 'ggbio'
## Also defined by 'hexbin'
## Found more than one class "simpleUnit" in cache; using the first, from namespace 'ggbio'
## Also defined by 'hexbin'
## Found more than one class "simpleUnit" in cache; using the first, from namespace 'ggbio'
## Also defined by 'hexbin'
Conduct principal component analysis and create PCA biplot to show variation in gene expression
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-7
library(ggplot2)
library(ggrepel)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble 3.1.0 ✓ dplyr 1.0.5
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::collapse() masks IRanges::collapse()
## x dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
## x dplyr::count() masks matrixStats::count()
## x dplyr::desc() masks IRanges::desc()
## x tidyr::expand() masks S4Vectors::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::first() masks S4Vectors::first()
## x dplyr::lag() masks stats::lag()
## x ggplot2::Position() masks BiocGenerics::Position(), base::Position()
## x purrr::reduce() masks GenomicRanges::reduce(), IRanges::reduce()
## x dplyr::rename() masks S4Vectors::rename()
## x dplyr::slice() masks IRanges::slice()
## x readr::spec() masks genefilter::spec()
rld_t=t(rld)
pca <- prcomp(rld_t,center = TRUE, scale = FALSE)
li <- pca$sdev^2 / sum(pca$sdev^2)
pc1v <- round(li[1] * 100, 1)
pc2v <- round(li[2] * 100, 1)
pca_s <- as.data.frame(pca$x)
pca_s <- pca_s[,c(1,2)]
pca_s$Samples = row.names(pca_s)
pca_s$treat=colData$treat
cbPalette <- c("darkgoldenrod2", "darkolivegreen3", "dodgerblue3")
ggplot(pca_s, aes(PC1, PC2, color = treat, pch = treat)) +
geom_point(size=3) +
# geom_text_repel(aes(label=Samples)) +
scale_colour_manual(values=cbPalette)+
theme_bw() +
# geom_density2d(alpha=.5)+
geom_polygon(alpha=.2)+
xlab(paste0("PC1: ",pc1v,"% variance")) +
ylab(paste0("PC2: ",pc2v,"% variance"))
Test if the groupings differ significantly by their PC scores on the PC1 and PC2 axis. Adonis is better in this case than ANOVA/MANOVA because it uses both the centroids and squared deviation of each sample to that centroid, so that it can detect significant differences when the groups center around each other, but the variance of each group from the centroid is different.
adonis(pca$x ~ treat, data = pca_s, method='eu', na.rm = TRUE) #No significance detected, p=0.3
## 'nperm' >= set of all permutations: complete enumeration.
## Set of permutations < 'minperm'. Generating entire set.
##
## Call:
## adonis(formula = pca$x ~ treat, data = pca_s, method = "eu", na.rm = TRUE)
##
## Permutation: free
## Number of permutations: 719
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## treat 1 4491.4 4491.4 1.0698 0.21101 0.3
## Residuals 4 16793.5 4198.4 0.78899
## Total 5 21284.9 1.00000
Create a heatmap of genes differentially expressed between the recovery and treatment groups
rldpvals <- read.csv(file="fav_Harvey_RLDandPVALS.csv", row.names=1)
head(rldpvals)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## isogroup0 3.86719166 3.7491423 5.0074464 4.2935649
## isogroup100027 -1.85901635 -1.7377292 -1.8297627 -1.8207768
## isogroup100029 -1.79322273 -1.7293472 -1.7807991 -1.6844258
## isogroup100033 0.00000000 0.0000000 0.0000000 0.0000000
## isogroup10004 0.04380176 -0.1381963 -0.2723896 0.3921522
## isogroup100049 0.87721463 1.6645507 1.6893611 2.1251942
## fav_stress.1 fav_stress.2 pval.stress padj.stress
## isogroup0 3.9490624 3.9086622 0.8457227 0.9953109
## isogroup100027 -1.6015120 -1.7362945 0.6392950 NA
## isogroup100029 -1.7445438 -1.7285451 0.6179434 NA
## isogroup100033 0.0000000 0.0000000 NA NA
## isogroup10004 -0.2014918 0.6506082 0.5456178 NA
## isogroup100049 1.2066803 0.8932394 0.7601060 NA
rld_site= rldpvals[,1:6]
head(rld_site)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## isogroup0 3.86719166 3.7491423 5.0074464 4.2935649
## isogroup100027 -1.85901635 -1.7377292 -1.8297627 -1.8207768
## isogroup100029 -1.79322273 -1.7293472 -1.7807991 -1.6844258
## isogroup100033 0.00000000 0.0000000 0.0000000 0.0000000
## isogroup10004 0.04380176 -0.1381963 -0.2723896 0.3921522
## isogroup100049 0.87721463 1.6645507 1.6893611 2.1251942
## fav_stress.1 fav_stress.2
## isogroup0 3.9490624 3.9086622
## isogroup100027 -1.6015120 -1.7362945
## isogroup100029 -1.7445438 -1.7285451
## isogroup100033 0.0000000 0.0000000
## isogroup10004 -0.2014918 0.6506082
## isogroup100049 1.2066803 0.8932394
gg=read.table("orb_fav_iso2gene.tab.txt",sep="\t", row.names=1)
head(gg)
## V2
## isogroup141339 Protein HIRA OS=Gallus gallus GN=HIRA PE=1 SV=2 E(blastx)=7e-34
## isogroup71425 ATPase ASNA1 homolog OS=Nematostella vectensis GN=v1g161623 PE=3 SV=1 E(blastx)=5e-108
## isogroup58561 Cullin-2 OS=Caenorhabditis elegans GN=cul-2 PE=1 SV=3 E(blastx)=3e-166
## isogroup60863 Valine--tRNA ligase OS=Takifugu rubripes GN=vars PE=3 SV=1 E(blastx)=7e-147
## isogroup51466 U1 snRNP-associated protein usp106 OS=Schizosaccharomyces pombe (strain 972 / ATCC 24843) GN=usp106 PE=1 SV=1 E(blastx)=9e-20
## isogroup36231 Tyrosinase OS=Gallus gallus GN=TYR PE=2 SV=1 E(blastx)=3e-118
nrow(rldpvals[rldpvals$padj.stress<0.01& !is.na(rldpvals$padj.stress),])
## [1] 19
topnum= 100 # number of DEGS
head(rldpvals)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## isogroup0 3.86719166 3.7491423 5.0074464 4.2935649
## isogroup100027 -1.85901635 -1.7377292 -1.8297627 -1.8207768
## isogroup100029 -1.79322273 -1.7293472 -1.7807991 -1.6844258
## isogroup100033 0.00000000 0.0000000 0.0000000 0.0000000
## isogroup10004 0.04380176 -0.1381963 -0.2723896 0.3921522
## isogroup100049 0.87721463 1.6645507 1.6893611 2.1251942
## fav_stress.1 fav_stress.2 pval.stress padj.stress
## isogroup0 3.9490624 3.9086622 0.8457227 0.9953109
## isogroup100027 -1.6015120 -1.7362945 0.6392950 NA
## isogroup100029 -1.7445438 -1.7285451 0.6179434 NA
## isogroup100033 0.0000000 0.0000000 NA NA
## isogroup10004 -0.2014918 0.6506082 0.5456178 NA
## isogroup100049 1.2066803 0.8932394 0.7601060 NA
top100=head(rldpvals[order(rldpvals$padj.stress), ],topnum)
head(top100)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## isogroup45425 1.578307 2.204950 2.872793 5.233677
## isogroup24484 11.984669 12.341940 14.587977 14.783647
## isogroup42660 4.643839 3.556030 4.958461 7.360883
## isogroup46163 7.702355 8.585987 6.173181 4.843954
## isogroup24251 5.499109 3.536387 5.955892 8.938884
## isogroup38800 3.162921 5.152066 4.289319 7.409656
## fav_stress.1 fav_stress.2 pval.stress padj.stress
## isogroup45425 4.031137 7.039794 1.653365e-11 7.579024e-08
## isogroup24484 15.098139 16.454547 4.076931e-11 9.344326e-08
## isogroup42660 6.226905 7.647572 1.256689e-09 1.826448e-06
## isogroup46163 5.387669 4.505555 1.593760e-09 1.826448e-06
## isogroup24251 5.954878 7.833701 7.303820e-08 6.696142e-05
## isogroup38800 5.550190 8.170632 2.615285e-07 1.498558e-04
length(top100[,1])
## [1] 100
summary(top100)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## Min. :-0.159 Min. : 1.048 Min. : 0.5378 Min. : 1.313
## 1st Qu.: 3.042 1st Qu.: 3.551 1st Qu.: 3.2050 1st Qu.: 3.765
## Median : 4.874 Median : 5.119 Median : 4.5335 Median : 5.512
## Mean : 4.853 Mean : 5.316 Mean : 4.7759 Mean : 5.689
## 3rd Qu.: 6.512 3rd Qu.: 6.648 3rd Qu.: 6.0955 3rd Qu.: 7.222
## Max. :19.539 Max. :20.696 Max. :19.1481 Max. :18.092
## fav_stress.1 fav_stress.2 pval.stress padj.stress
## Min. : 1.172 Min. : 1.574 Min. :0.0000000 Min. :8.000e-08
## 1st Qu.: 3.223 1st Qu.: 4.148 1st Qu.:0.0002253 1st Qu.:4.006e-02
## Median : 5.095 Median : 5.916 Median :0.0011691 Median :1.051e-01
## Mean : 5.134 Mean : 5.952 Mean :0.0019460 Mean :1.219e-01
## 3rd Qu.: 6.438 3rd Qu.: 7.675 3rd Qu.:0.0033789 3rd Qu.:2.013e-01
## Max. :19.343 Max. :18.700 Max. :0.0062179 Max. :2.850e-01
library(pheatmap)
head(top100)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## isogroup45425 1.578307 2.204950 2.872793 5.233677
## isogroup24484 11.984669 12.341940 14.587977 14.783647
## isogroup42660 4.643839 3.556030 4.958461 7.360883
## isogroup46163 7.702355 8.585987 6.173181 4.843954
## isogroup24251 5.499109 3.536387 5.955892 8.938884
## isogroup38800 3.162921 5.152066 4.289319 7.409656
## fav_stress.1 fav_stress.2 pval.stress padj.stress
## isogroup45425 4.031137 7.039794 1.653365e-11 7.579024e-08
## isogroup24484 15.098139 16.454547 4.076931e-11 9.344326e-08
## isogroup42660 6.226905 7.647572 1.256689e-09 1.826448e-06
## isogroup46163 5.387669 4.505555 1.593760e-09 1.826448e-06
## isogroup24251 5.954878 7.833701 7.303820e-08 6.696142e-05
## isogroup38800 5.550190 8.170632 2.615285e-07 1.498558e-04
p.val=0.1 # FDR cutoff
conds=top100[top100$padj.stress<=p.val & !is.na(top100$padj.stress),]
length(conds[,1])
## [1] 46
exp=conds[,1:6] # change numbers to be your vsd data columns
means=apply(exp,1,mean) # means of rows
explc=exp-means # subtracting them
head(explc)
## fav_recovery fav_recovery.1 fav_recovery.2 fav_stress
## isogroup45425 -2.2484694 -1.6218261 -0.95398338 1.4069005
## isogroup24484 -2.2238176 -1.8665461 0.37949086 0.5751603
## isogroup42660 -1.0884424 -2.1762517 -0.77382055 1.6286017
## isogroup46163 1.5025718 2.3862031 -0.02660246 -1.3558294
## isogroup24251 -0.7873658 -2.7500879 -0.33058313 2.6524089
## isogroup38800 -2.4595425 -0.4703981 -1.33314474 1.7871916
## fav_stress.1 fav_stress.2
## isogroup45425 0.20436085 3.213018
## isogroup24484 0.88965224 2.246060
## isogroup42660 0.49462283 1.915290
## isogroup46163 -0.81211466 -1.694228
## isogroup24251 -0.33159756 1.547225
## isogroup38800 -0.07227414 2.548168
ccol=colorRampPalette(rev(c("red","chocolate1","#FEE090","grey10", "cyan3","cyan")))(100)
col0=colorRampPalette(rev(c("chocolate1","#FEE090","grey10", "cyan3","cyan")))(100)
pheatmap(explc,cluster_cols=T,scale="row",color=col0, show_rownames = F)
Our DESeq2 and GO analysis do not match the results found by Wright et al. This is primarily due to the fact that early in our analysis we incorrectly split our data by treatments rather than by genotypes. We were instructed to continue with our analysis, but anticipate that if we were to rerun DESeq2 and GO we would find similar results to those in the paper.
Wright et al. found that the sub-lethal stress caused by the influx of freshwater from Harvey impacted the redox state and mitochondrial function of Orbicella and their associated algal symbionts. Previous studies show that extended periods of stress leads to increased risk of mortality, and given the increasing frequency of large storm systems in the Gulf of Mexico, it is very likely that coral in the FGB will continue to experience longer periods of stress.
This study was somewhat limited by only having two sampling periods– future research is needed to determine seasonal variation in gene expression.
Coralpedia - Your guide to Caribbean corals and sponges. (2021). Retrieved March 29, 2021, from https://coralpedia.bio.warwick.ac.uk/en/corals/montastraea_faveolata)
US Department of Commerce, N. (2020, July 13). Hurricane Harvey Info. Retrieved March 29, 2021, from https://www.weather.gov/hgx/hurricaneharvey
van Oldenborgh, G. J., van der Wiel, K., Sebastian, A., Singh, R., Arrighi, J., Otto, F., et al. (2017). Attribution of extreme rainfall from Hurricane Harvey, August 2017. Environ. Res. Lett. 12:124009. doi: 10.1088/1748-9326/aa9ef2
Wright, R.M., Correa, M.S., Quigley, L.A., Santiago-Vazquez, L.Z., Shamberger, K.E.F., and Davies, S.W. (2019). Gene expression of endangered coral (Orbicella spp.) in Flower Garden Banks National Marine Sanctuary after hurricane Harvey. Frontiers in Marine Science 6:672. https://doi.org/10.3389/fmars.2019.00672